Statistical principal components analysis for retrieval experiments

نویسنده

  • Bekir Taner Dinçer
چکیده

© 2007 Wiley Periodicals, Inc. • Published online 22 January 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asi.20537 three fundamental components: a set of documents, a set of posed information needs, and a set of relevance judgments. Relevance judgments are the collections of documents that should be retrieved for each information need, and a posed information need is a query that may be formulated by any inquirer (user). In this paradigm, relevance is the sole effectiveness criterion and the effectiveness of a system based on relevance is measured in two dimensions: the ability to retrieve documents that are known to be relevant and the ability to suppress documents that are known to be nonrelevant. The majority of the currently used measures of relevance are based on precision and recall. Precision is the proportion of retrieved documents that are relevant; recall is the proportion of relevant documents that are retrieved. This experimental design paradigm has been in use for over four decades, and it is still actively used in almost all large-scale experimental evaluation efforts. In the traditional evaluation of retrieval experiments, performances of the systems are measured over a set of queries (or information needs, or topics).1 Because a performance summary measure is necessary to compare different retrieval strategies over all predefined information needs, a final summary performance score for each retrieval strategy is calculated as the average of its performance scores observed on all topics. In particular, the mean average precision (MAP) is the most widely used summary measure. A MAP score of a particular retrieval strategy is the mean of the uninterpolated average precisions observed on all topics, and in turn, an uninterpolated average precision score of a document set retrieved by a particular retrieval strategy is the average of all precision scores that are calculated at each relevant document reached from the beginning in that document set (van Rijsbergen, 1979). For visual performance comparisons, a recall-precision graph can also be used (van Rijsbergen, 1979). To construct a recall precision graph for a number of different retrieval strategies, all individual recall-precision curves of those Statistical Principal Components Analysis for Retrieval Experiments

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Persian Handwriting Analysis Using Functional Principal Components

Principal components analysis is a well-known statistical method in dealing with large dependent data sets. It is also used in functional data for both purposes of data reduction as well as variation representation. On the other hand "handwriting" is one of the objects, studied in various statistical fields like pattern recognition and shape analysis. Considering time as the argument,...

متن کامل

Non-negative bases in spectral image archiving

This thesis supposes an application of Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF) and Non-negative Tensor Factorization (NTF) for digital image archiving. It is aimed to develop new efficient methods for spectral image acquisition, compression and retrieval. It hypothesizes that the non-negative bases are more suitable for spectral archiving beside convenient or...

متن کامل

Applying Discrete PCA in Data Analysis

Methods for analysis of principal components in discrete data have existed for some time under various names such as grade of membership modelling, probabilistic latent semantic analysis, and genotype inference with admixture. In this paper we explore a number of extensions to the common theory, and present some application of these methods to some common statistical tasks. We show that these m...

متن کامل

Functional Analysis of Iranian Temperature and Precipitation by Using Functional Principal Components Analysis

Extended Abstract. When data are in the form of continuous functions, they may challenge classical methods of data analysis based on arguments in finite dimensional spaces, and therefore need theoretical justification. Infinite dimensionality of spaces that data belong to, leads to major statistical methodologies and new insights for analyzing them, which is called functional data analysis (FDA...

متن کامل

Analysis of physiochemical and microbial quality of waters of the Karkheh River in southwestern Iran using multivariate statistical methods

Rapid population growth as well as agricultural and industrial development have increased the contamination of Iranian rivers. This study utilized principal components analysis (PCA) to determine the degree of significance of qualitative parameters of water resources in the Karkheh River in southwestern Iran. Cluster analysis (CA) grouped the monitoring stations based on the water quality data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 58  شماره 

صفحات  -

تاریخ انتشار 2007